-
Notifications
You must be signed in to change notification settings - Fork 178
CWE-230 Improper handling of missing values #947
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
few cosmetics, +1 otherwise.
|
||
## Non-Compliant Code Example | ||
|
||
This noncompliant code example [[2024 docs.python.org]](https://docs.python.org/3/reference/expressions.html#value-comparisons) attempts a direct comparison with `NaN` in `_value == float("NaN")`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This noncompliant code example [[2024 docs.python.org]](https://docs.python.org/3/reference/expressions.html#value-comparisons) attempts a direct comparison with `NaN` in `_value == float("NaN")`. | |
The `noncompliant01.py` code example [[2024 docs.python.org]](https://docs.python.org/3/reference/expressions.html#value-comparisons) attempts a direct comparison with `NaN` in `_value == float("NaN")`. |
|
||
## Compliant Solution | ||
|
||
The `compliant01.py` the method Decimal.quantize is used to gain control over known rounding errors in floating point values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `compliant01.py` the method Decimal.quantize is used to gain control over known rounding errors in floating point values. | |
The `compliant01.py` the method `Decimal.quantize` is used to gain control over known rounding errors in floating point values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence also needs a small rewording. Maybe something like this?
The `compliant01.py` the method Decimal.quantize is used to gain control over known rounding errors in floating point values. | |
In the `compliant01.py` cod example, the method `Decimal.quantize` is used to gain control over known rounding errors in floating point values. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished the review. The code overall looks good, I had only one small comment to it, but there were also some small problems in the readme.
@@ -0,0 +1,148 @@ | |||
# CWE-230: Improper Handling of Missing Values | |||
|
|||
In python, some datasets use `NaN` (not-a-number) to represent the missing data. This can be problematic as the `NaN` values are unordered. The `NaN` value should be stripped before as they can cause surprising or undefined behaviours in the statistics functions that sort or count occurrences [[2024 doc.python.org]](https://docs.python.org/3/library/statistics.html) Any ordered comparison of a number to a not-a-number value are `False`. A counter-intuitive implication is that `not-a-number` values are not equal to themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we try to have one summary sentence in the first paragraph for the search engines, perhaps something like this could work?
In python, some datasets use `NaN` (not-a-number) to represent the missing data. This can be problematic as the `NaN` values are unordered. The `NaN` value should be stripped before as they can cause surprising or undefined behaviours in the statistics functions that sort or count occurrences [[2024 doc.python.org]](https://docs.python.org/3/library/statistics.html) Any ordered comparison of a number to a not-a-number value are `False`. A counter-intuitive implication is that `not-a-number` values are not equal to themselves. | |
The `NaN` value should be stripped before as they can cause surprising or undefined behaviours in the statistics functions that sort or count occurrences [[2024 doc.python.org]](https://docs.python.org/3/library/statistics.html). | |
In python, some datasets use `NaN` (not-a-number) to represent the missing data. This can be problematic as the `NaN` values are unordered. Any ordered comparison of a number to a not-a-number value are `False`. A counter-intuitive implication is that `not-a-number` values are not equal to themselves. |
|
||
## Compliant Solution | ||
|
||
The `compliant01.py` the method Decimal.quantize is used to gain control over known rounding errors in floating point values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sentence also needs a small rewording. Maybe something like this?
The `compliant01.py` the method Decimal.quantize is used to gain control over known rounding errors in floating point values. | |
In the `compliant01.py` cod example, the method `Decimal.quantize` is used to gain control over known rounding errors in floating point values. | |
|
||
The `compliant01.py` the method Decimal.quantize is used to gain control over known rounding errors in floating point values. | ||
|
||
The decision by the balance_is_positive method is to `ROUND_DOWN` instead of the default `ROUND_HALF_EVEN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision by the balance_is_positive method is to `ROUND_DOWN` instead of the default `ROUND_HALF_EVEN`. | |
The decision by the `balance_is_positive` method is to `ROUND_DOWN` instead of the default `ROUND_HALF_EVEN`. | |
|
||
`Decimal` throws a `decimal.InvalidOperation` for `NaN` values, the controlled rounding causes only `"0.01"` to return `True`. | ||
|
||
In `compliant02.py` we use the math.isnan to very if the value passed is a valid `float` value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In `compliant02.py` we use the math.isnan to very if the value passed is a valid `float` value. | |
In `compliant02.py` we use the `math.isnan` to verify if the value passed is a valid `float` value. | |
This behavior is compliant with IEEE 754[[2024 Wikipedia]](https://en.wikipedia.org/wiki/IEEE_754) a hardware induced compromise. | ||
The [example01.py](example01.py) code demonstrates various comparisons of `float('NaN')` all resulting in `False`. | ||
|
||
```python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
```python | |
```python | |
# SPDX-FileCopyrightText: OpenSSF project contributors | |
# SPDX-License-Identifier: MIT |
*[noncompliant01.py](noncompliant01.py):* | ||
|
||
```python | ||
""" Non-compliant Code Example """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" Non-compliant Code Example """ | |
# SPDX-FileCopyrightText: OpenSSF project contributors | |
# SPDX-License-Identifier: MIT | |
""" Non-compliant Code Example """ |
_value = float(value) | ||
if math.isnan(_value) or _value is None: | ||
raise ValueError("Expected a float") | ||
if _value <= 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be slightly confusing because print(balance_is_positive("0.001"))
returns false in compliant01.py but returns true here. If we assume the balance uses cents, we could just change the threshold in this condition:
if _value <= 0: | |
if _value < 0.01: |
If we do so, the same should be done in noncomplaint01.py so that this part of the code remains unchanged.
|Tool|Version|Checker|Description| | ||
|:----|:----|:----|:----| | ||
|Bandit|1.7.4 on Python 3.10.4|Not Available|| | ||
|flake8|flake8-4.0.1 on python 3.10.4||FS002 '.format' used| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I run flake8 on my machine and no error was shown. I don't see .format
used in any code example, so I assume it's a leftover from an old version of the rule.
|flake8|flake8-4.0.1 on python 3.10.4||FS002 '.format' used| | |
|flake8|flake8-4.0.1 on python 3.10.4|Not Available|| |
|
||
||| | ||
|:---|:---| | ||
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[IDS06-J. Exclude unsanitized user input from format strings](https://wiki.sei.cmu.edu/confluence/display/java/IDS06-J.+Exclude+unsanitized+user+input+from+format+strings)| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SEI CERT rule seems unrelated. I have found another one that talks specifically about NaN values:
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[IDS06-J. Exclude unsanitized user input from format strings](https://wiki.sei.cmu.edu/confluence/display/java/IDS06-J.+Exclude+unsanitized+user+input+from+format+strings)| | |
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[NUM07-J. Do not attempt comparisons with NaN](https://wiki.sei.cmu.edu/confluence/display/java/NUM07-J.+Do+not+attempt+comparisons+with+NaN)| | |
Before this commit the wording was that modifying list works but is not recommended. But it works as long as no two consecutive elements are deleted, otherwise part of elements is not checked at all without any exceptions raised. Changed README.md, compliant01.py and noncompliant01.py to demonstrate that. Signed-off-by: Kyrylo Yatsenko <[email protected]> Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: ewlxdnx <[email protected]>
…f#936) * guide Signed-off-by: balteravishay <[email protected]> * guide Signed-off-by: balteravishay <[email protected]> * remove temp files Signed-off-by: balteravishay <[email protected]> * lint Signed-off-by: balteravishay <[email protected]> * Update docs/Security-Focused-Guide-for-AI-Code-Assistant-Instructions.md Signed-off-by: Avishay Balter <[email protected]> * Update docs/Security-Focused-Guide-for-AI-Code-Assistant-Instructions.md Signed-off-by: Avishay Balter <[email protected]> --------- Signed-off-by: balteravishay <[email protected]> Signed-off-by: Avishay Balter <[email protected]> Signed-off-by: ewlxdnx <[email protected]>
Signed-off-by: ewlxdnx <[email protected]>
Signed-off-by: ewlxdnx <[email protected]>
Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: ewlxdnx <[email protected]>
Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: ewlxdnx <[email protected]>
No description provided.